Automatic Profiling of Twitter Users Based on Their Tweets: Notebook for PAN at CLEF 2015
نویسندگان
چکیده
In this paper we go through our approach at solving the PAN Author Profiling task. We introduce a novel way of computing the type/token ratio of an author and show that, although strong correlations have been observed between high extroversion and low type/token ratios in the past, this ratio is not necessarily a strong indicator of extroversion. Since the text of a person is influenced by all 7 features (gender, age, and big five personality traits) that are required to be automatically identified in this task, we used this ratio, along with Term frequency-Inverse document frequency (tf-idf ) matrices, in all 7 subtasks and all 4 corpora and obtained good results.
منابع مشابه
UniNE at CLEF 2015 Author Profiling: Notebook for PAN at CLEF 2015
This paper describes and evaluates an effective author profiling model called SPATIUM-L1. The suggested strategy can be adapted without any problem to different languages (such as Dutch, English, Italian, and Spanish) in Twitter tweets. As features, we suggest using the 200 most frequent terms of the query text (isolated words and punctuation symbols). Applying a simple distance measure and loo...
متن کاملAge, Gender and Personality Recognition using Tweets in a Multilingual setting: Notebook for PAN at CLEF 2015
User generated text on social media sites is a rich source of information that can be used to identify different aspects of their authors. Proper mining of this content provides an automatic way of identifying users which is very valuable for applications that rely on personalisation. In this work, we describe the properties of our multilingual software submitted for PAN2015 which recognizes th...
متن کاملSegmenting Target Audiences: Automatic Author Profiling using Tweets: Notebook for PAN at CLEF 2015
This paper describes a methodology proposed for author profiling using natural language processing and machine learning techniques. We used lexical information in the learning process. For those languages without lexicons, we automatically translated them, in order to be able to use this information. Finally, we will discuss how we applied this methodology to the 3rd Author Profiling Task at PA...
متن کاملAuthor Profiling of Twitter Users: Notebook for PAN at CLEF 2015
In this paper, we focused on profiling authors on age, gender, and five personality traits. The corpus consists of anonymized twitter posts categorized into 4 different languages. Our proposed approach was to use a combination of tfidf, function words, stylistic features, and text bigrams, and used an SVM for each task.
متن کاملTopic Models and n-gram Language Models for Author Profiling - Notebook for PAN at CLEF 2015
Author profiling is the task of determining the attributes for a set of authors. This paper presents the design, approach, and results of our submission to the PAN 2015 Author Profiling Shared Task. Four corpora, each in a different language, were provided. Each corpus consisted of collections of tweets for a number of Twitter users whose gender, age and personality scores are know. The task wa...
متن کامل